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Abstract — This paper deals with the design of time-invariant 
memoryless control policies for robots that move in a finite two- 
dimensional lattice and are tasked with persistent surveillance of 
an area in which there are forbidden regions. We model each 
robot as a controlled Markov chain whose state comprises its 
position in the lattice and the direction of motion. The goal 
is to find the minimum number of robots and an associated 
time-invariant memoryless control policy that guarantees that the 
largest number of states are persistently surveilled without ever 
visiting a forbidden state. We propose a design method that relies 
on a finitely parametrized convex program inspired by entropy 
maximization principles. Numerical examples are provided. 

I. Introduction 

We develop a method to design memoryless controllers 
for robots that move in a finite two-dimensional lattice. The 
goal is to achieve persistent surveillance. The term "persistent 
surveillance" is used to denote the task of continuously visiting 
the largest possible set of points in the lattice. In our setup, we 
also impose safety constraints that dictate that certain regions 
are forbidden. The forbidden regions may represent areas in 
which robots cannot operate (such as bodies of water) or are 
not allowed to visit (such as restricted airspace). The goal 
is to deploy the minimum number of robots equipped with 
a control policy that guarantees persistent surveillance of the 
largest possible set of lattice points without ever visiting a 
forbidden region. The memoryless strategies proposed here are 
applicable to miniature robots that have severe computational 
constraints. 

The concept of persistent survaillance is similar to the 
concept of coverage [5 j, but differs from it in that the area to 
be surveilled must be revisited infinitely many times. Control 
design for persistent surveillance has been studied in [12|, 
|fl3l , where a semi -heuristic control policy that minimizes the 
time between visitations to the same region is proposed, and in 
[10 1, which proposes an algorithm for persistent surveillance 
of a convex polygon in the plane. These approaches, however, 
are not restricted to memoryless policies and do not consider 
safety constraints. On the implementation front, system ar- 
chitectures for unmanned aerial vehicles have been designed 
specifically for persistent surveillance purposes 0, ifTTl . 

In this paper, we model each robot as a fully-observed 
controlled Markov chain with finite state and control spaces. 
This approach, which has been successfully used in the context 



of navigation and path planning (such as in fl5l . IfTTl . [16|, 
[4]), allows for the development of robust and highly scalable 
algorithms. Without loss of generality, we consider robots 
whose state is taken as its position on a finite two-dimensional 
lattice and direction of motion (taken from a set of four 
possible orientations), and limit the control space to two 
control actions ("forward" and "turn right"). The limitation 
in the control space illustrates how constrained actuation can 
be incorporated in our formulation. It is important to highlight, 
however, that the ideas described in this paper can be extended 
to more general dynamics and state/control spaces. 

We use a recent result in JT| to compute the largest set 
of states that can be persistently surveilled under safety 
constraints, and an associated memoryless control policy. The 
proposed solution relies on a finitely parametrized convex 
program, which is highly scalable and can be efficiently 
solved by standard convex optimization tools, such as JSJ. 
The approach is based on the fact that the probability mass 
function that maximizes the entropy under convex constraints 
has maximal support J6). We also show that the minimum 
number of robots needed to perform persistent surveillance 
of the largest set of states (without ever violating the safety 
constraint) is equal to the number of recurrent classes of the 
closed loop Markov chain under the control policy computed 
by the proposed convex program. The recurrent classes can 
be found by traversing the graph of the closed loop Markov 
chain. 

The remainder of this paper is organized as follows. Sec- 
tion [II] provides notation, basic definitions and the problem 
statement. The convex program that computes the maximal 
set of persistently surveilled states and its associated control 



policy is presented in Section III Section IV provides details 
on computing the smallest deployment of robots necessary for 
maximal persistently surveillance. We discuss limiting behav- 
ior and use of additional constraints in Section [VI Conclusions 



are provided in Section VI Numerical examples are given 



throughout the paper to illustrate concepts and the proposed 
methodology. 

II. Preliminaries and Problem Statements 
The following notation is used throughout the paper: 



X x Y 



def 



set of lattice positions 
set of orientations 



x Y x O set of robot states 

F C § set of forbidden states 

U set of control actions 

Sk state of the robot at time k 

Uk control action at time k 

The state of the robot will be graphically represented as 
shown in Fig. [TJ 



(1,2,10 



(3,1, R) 



Fig. 1: Graphical representation of the state of the robot. In 
this examples, we use X = {1,2,3}, Y = {1,2}, and O = 
{R, U, L, D}, where R,U,L and D represent right, up, left 
and down directions, respectively. 

The robot's dynamics are governed by the (conditional) 
probability of Sk+i given the current state Sk and control 
action Uk, and are denoted as: 



Q(s+,s,u) = f P(S k+1 



Sk = s, U k = u), 



where s, s + E S, u E V. 

We denote any memoryless control policy by 



def 



JC{u, s) = P{U k = u Sk = s), 



u E U, s E §, 



where X^eu s ) — 1 f° r a U s m §< The set of all such 
policies is denoted as K. Note that the computation of a control 
action may be deterministic (when JC(u, s) = 1 for a given 
action u) or carried out in a randomized manner, in which case 
the policy dictates the probabilities assigned to each control 
action for a given state. 

Assumptions: 

• Throughout the paper we assume that Q is given. Hence, 
all quantities and sets that depend on the closed loop 
behavior may be indexed only by the underlying control 
policy JC. 

• When multiple robots are considered, we assume that they 
are identical and have dynamics governed by Q. In these 
situations, every robot executes the same control policy. 
Moreover, multiple robots are allowed to occupy the same 
position. 

Given a control policy JC, the conditional state transition 
probability of the closed loop is represented as: 



VtzySk+i 



\Sk 



s) * £ Q( S + 

uGU 



s, u)JC(u, s) 



We will also refer to this quantity as Qic(s + ,s) 



ief 



V K (S; 



fc+l 



\Sk 



A. Recurrence and Persistent Surveillance 

A state s E § is recurrent under a control policy JC if the 
probability of a robot revisiting state s is one, that is: 



P/c{Sk — s for some k > I So 



1. 



(1) 



We define the set of recurrent states under control policy 



JC as follows: 



js E § : (1) holds} 



Remark 2.1: Membership in §^ guarantees that once a 
state is visited, it will be revisited infinitely many times 
under control policy JC. It does not, however, guarantee that 
each state in will be visited for all initial states in 
because §^ may contain multiple recurrent classes. In fact, 
a robot will visit a certain recurrent state s with probability 
one if and only if it is initialized in the same recurrent class. 
Moreover, note that once a robot enters a recurrent class, it 
will never exit under control policy JC. 

We say a state s is persistently surveilled under control 
policy JC and initial state sq E S if it is recurrent under JC and 



Pjc(Sk = s for some k > I Sq = So) = 1. 



(2) 



If a state s is persistently surveilled under control policy JC 
and initial state so G then it must be that s and so belong 
to the same recurrent class. 

We define the set of persistently surveilled states S^ s ^ 
under control policy JC and initial state sq € S to be: 

SP q s x d = f |s e§£: (2| holds}. 

The set S^* K is a recurrent class of the closed loop 
dynamics Q/c- Note that for every state s in K , it holds 



that 

for which S 



ps 



s ,K 



Moreover, if there exists a recurrent state 



— §j^, the set §^ has only one recurrent class. 



Given a set F of forbidden states, we define the set of states 
that are recurrent and for which the probability of transitioning 
into F is zero. 

The set of V-safe recurrent states e under a control 
policy JC is defined as: 

S&f = {* G Sjg : Qk{s + , s) = 0, s+ E F}. 
We define the maximal set of W-safe recurrent states as: 
sfl d ±f I I sfl 

{J S /C,F- 

/ceK 

Finally, the set of F-safe persistently surveilled states 
fsP s K F under a control policy JC and initial state sq E S is 
defined as: 



^ps def 
5 s ,/C,F — 



s . 



of Q K 



s 

s ,ic;. 



0, s+ e f|. 



Remark 2.2: As before, §^ r F is a (safe) recurrent class 



B. Problem Statement 

We start by addressing the following problem: 
Problem 2.3: (Maximal set of F-safe recurrent states). 
Given a set of forbidden states F, determine: 

(a) S^; and 

(b) a control policy JC* such that §S» = 



In light of Remark 2.1 note that in order to persistently 
surveil all possible states, we need to determine how many 
robots to use and in which state they should be initialized. 
The following problem addresses this issue. 

Problem 2.4: (Maximal F-safe persistent surveillance). 
Given a set of forbidden states F, determine the minimum 
number of robots r, a control policy JC and a set of initial 
states {s 1 , ...,s r }, r, so that 



Us 



ps 



(3) 



Remark 2.5: The following is a list of important comments 
on Problems [23] and El 

• There is no JC such that the states in §\§ F can be F-safe 
and recurrent 

• Once r robots are initialized with initial states 
{s 1 , s r }, it is guaranteed that the largest possible set 
of states will be visited infinitely many times without ever 
visiting a forbidden state. 

We will propose a convex optimization problem that ef- 
ficiently computes § F and a control policy JC* such that 
= Sj?. We will show that the minimum number of robots 
r required to persistently surveil § F is the number of distinct 
recurrent classes of the closed loop Markov chain under the 
computed control policy JC* . 

III. Computing the Maximal Set of Recurrent 
States: A Convex Approach 

Let P§u be the set of all probability mass functions (pmfs) 
with support in § x U, and consider the following convex 
optimization program: 

/s t/ = arg max H(f S u) (4) 

subject to: 

fsu(s + ,u+)= Q(s + ^^)f su ( Sl u) (5) 



u+ev 



ses.uev 



J2fsu(s,u) = 0, seF 



(6) 



the marginal pmf f$(s) = 2~2 u ev fsu( s ' u ) an( ^ consider that 
Q : UxS — > [0, 1] is any function satisfying 2~2 u <ev S(u, s) = 1 
for all s in S. The following holds: 

(a) S*=W /3 

(b) §g » = for JC* given by: 

C fsu( s ' u ) 

JC*(u,s) = l fsW 



s e 



G(u,s), 



n 

otherwise 



(u,s)eUx§ (7) 



where W/* is the support of f£ and is given by W/» ={s£ 
S: f* s (s)>0}. 

The proof of 



Comments on the proof of Proposition 3.1 



Proposition 3.1 closely follows the proof of Theorem 3.1 in 



H] and is omitted. However, it is important to highlight that 
constraint |5} enforces recurrence and constraint (|6) enforces 
F-safety. Moreover, note that the pmf that maximizes the 
entropy under convex constraints has maximal support (see 
Lemma 3.5 in HI). 

Example 3.2: Let X = Y = {1,...,5}, O = {R,U,L,D} 
and consider a robot whose action space is given by U = 
{"forward" , "turn right"}. Moreover, let the set of for- 
bidden states be given by: F = {(x, y,6) € § : (x,y) € 
{(1, 1), (1, 5), (5, 1), (5, 5), (3, 3)}}, which means the robot is 
prohibited from visiting the center or corner locations of the 
lattice. 

In order to specify Q, we first define an auxiliary con- 
ditional pmf Q! defined on X' = Y' = {1,2,3} and 
©' = {R, U, L, D}. For clarity, Q' is shown graphically in Fig. 
[2] which contains the probabilities of transitioning to states 
shown as dark triangles given the previous state shown as a 
white triangle. There is uncertainty only for transitions that 
occur on the edge of the lattice. Since we consider dynamics 
that are spatially invariant, the transition probabilities for 
states not shown in Fig. [2] can be computed by appropriate 
manipulation of the ones shown. Similarly, Q is constructed 
by appropriate expansion of Q'. 

We use |8) to solve (|4j)-(j6]l and use Proposition 3.1 to 
compute §>f and a control policy JC* such that = §j?. 
The set S^? can be seen in Fig. [3] where the areas in red 
represent the states in F, and the triangles in blue represent 
the states in Sp . The control JC*, computed using ([7J), has been 
omitted due to space constraints. 

IV. Maximal Persistent Surveillance and 
Robot Deployment 



where % 



so 



3?>o is the entropy of f su , and is given In this section, we provide a solution to Problem [23] which 



by 



K{fsu) = - ^2^2fsu(s,u) ln(fsu(s,uj) 

where we adopt the standard convention that ln(0) = 0. 

The following proposition, which has been modified from 
Theorem 3.1 in [1], provides a solution to Problem |2.3| 

Proposition 3.1: Let F be given, assume that is fea- 

sible, and that fg V is the optimal solution. In addition, adopt 



seeks the minimum number r of robots, a control policy JC and 
a set of initial states {s 1 , s r } so that |Jj=i S^^- F = 

In light of a previous remark, recall that the set of F-safe 
persistently surveilled states K ¥ is a recurrent class of 
Qic- In practice, this means that when a robot with initial 
state Sq = so applies control policy JC, it is guaranteed that: 

• the robot will never leave K F ; 

• every state in K F will be visited infinitely many times; 

• states in F will never be visited. 
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Fig. 2: Graphical representation of some transitions in Q'. 




Fig. 3: Depiction of in blue. The red areas represent the 
forbidden states. 



To find all the (safe) recurrent classes in Sj£ ¥ , flood-fill- 
type algorithms may be used, where the graph of Qjc is 
traversed, either in a depth-first or breath-first manner. An 
edge from s to s + of the graph of Qx, exists if and only if 
Qk(s + ,s) > holds. Note that states in S\(S£ F U F) do 
not need to be searched. 



distinct recurrent classes of Qx;, and note that the following 
holds: 



where {s 1 , s n,c } is a set of initial states, and {S^f K nY^—i 
are distinct recurrent classes. 

We define the set of all admissible control policies whose 
F-safe set of recurrent states are maximal to be: 

= {/C G K : §* F = 

and note that in order to solve Problem |2.4| we must: 

• find a control policy K. in Kp such that < njc for 
all JC in Kf. Note that is the minimum number of 
robots needed for maximal persistent surveillance. 

• identify the recurrent classes in (by exploring the 
graph of Q£)\ and 

• select one (any) state from each of the recurrent classes 
to compose the set of initial states 

Note that the control policy JC* given in d7j is a candidate 
for maximal persistent surveillance since Sy^* — • The 
following proposition will show that n^* < for all JC in 

Proposition 4.1: Let F be given, and take JC* to be the 
control policy in |7]). The following holds: 



K G 



Proof: Suppose there exists a control policy K. in Kp 
such that rt£ < n/c* ■ Since K. belongs to K^, there must exist 
a control policy JC with the same sparsity pattern as JC and a 
pmf fsu m Fsu that satisfies ^ and (Jfrjl for which: 



K.[u, s) 



' fsu(s,u) 

fs(s) ' 

Q(u,s), 



SG§# 
otherwise 



(n,s)eUx§ 



where f s (s) = £„ eU f su (s, u) and Q : U x § -> [0, 1]. 

Since has fewer recurrent classes than Qjc*> there must 
exist a pair (s,u) in Sp x U for which JC(s,u) > and 
K.*(s,u) = holds. Since K, and AC have the same sparsity 
pattern, it holds that K,(s, u) > 0. Therefore, it must be that 
fsu(S) u) > and fgu(s, u) — 0. In other words, the support 
°f fsu is not contained in the support of fg Ut which is a 
contradiction by Lemma 3.5 in (TJ. ■ 
Remark 4.2: Suppose we change the objective function in 
Q to H(fs) and add the following constraint: /s(s) = 
Y^ucp fsu i u i s )- Note that an appropriate modification of 



Proposition 3.1 would enable us to find §p and an associated 



control policy (i.e., solve Problem 3.1 1, with the added benefit 
that the modified convex program would be computationally 
less intensive (since fewer calls to the entropy function would 
be required). However, maximizing the entropy of the marginal 
distribution fs would not solve the problem of maximal 



persistent surveillance since Proposition 4.1 would not apply. 



Given F and a control policy /C, let njc be the number of 



Example 4.3: Consider again the example described in Ex- 
ample 3.2 By exploring the graph of Q/c, we conclude 





(2,1, 10 



(2,4,(7) 



Fig. 4: Top left: maximal set of recurrent states (in blue). 
Others: three recurrent classes whose union is E>§. 



that only one robot is required to perform maximal persistent 
surveillance (i.e., Sp contains only one recurrent class). Any 
state in Sp may be selected as the robot's initial state. 

Suppose that we now change the set of forbidden states 
to include location (4,3) (i.e., let F = {(x,y, 9) E 
S:(x,y) G {(1,1), (1,5), (5,1), (5,5), (3,3), (4,3)}}). Re- 
solving |4|l-((6]), applying Propositions |3. 1 1 and |4. 1 1 and search- 
ing the graph of the closed loop Markov chain, we conclude 
that at least three robots are required to perform maximal 
persistent surveillance of §p (see Fig. Any state from each 
recurrent class may be used as initial states, so we can chose 
the set of initial states to be: {(1, 2, U), (2, 1, U), (2,4, £/)}. 
Note that the set § F is now smaller than in the previous 
example (34 vs. 40 states). 

V. Limiting Behavior and Other Constraints 

We define Tic, the long term proportion of time the robot, 
under control policy K, visits state s in § having started at 
state so, to be: 

1 fe 

T/c(s,s ) = lim - y^l(Si = s,S = s ), 

i=l 

where 1 is the indicator function. 



A. Limiting Behavior with One Recurrent Class 

Given a forbidden set F, and let fg V be the optimal solution 
to |4]l-((6]), and JC* be the control policy computed in |7}, and 
suppose §j^» F has only one recurrent class. For any initial 
state sq in Sy , the following holds with probability one: 



7*;*(s,s ) = fs(s), 



(8) 



were fg(s) = Yluetl fsu( s ' u )- Since we have not imposed 
aperiodicity on Qic*, we cannot state stronger convergence. 
However, equation ([8]) still tells us valuable information re- 
garding the limiting behavior of the robot. 

Note that the pmf that maximizes the entropy is "as uniform 
as possible" (in fact, when unconstrained, the pmf that max- 
imizes the entropy is uniform.). However, additional convex 
constraints can be added to our formulation in order to shape 
the distribution of the optimal pmf and, thus, influence the 
limiting behavior of the robot. 

Consider the following constraint: 



(x,y)en, set 



fsu((x,y,9),u) > a, 



(9) 



tigU 



where D C X x Y is a region of the lattice. The set D 
can be interpreted as a region of high interest that should 
be surveilled more often. Suppose the convex program |4|-((6]) 
and ^ is feasible, that is the optimal solution and JC** 
is the associated control policy. The following holds for any 
so in §>f with probability one: 

7k.**((x,y,Q),s ) > a. 

(x,3/)eD, 0GQ 

Example 5.1: Let X = Y = {1, 10}, O = {R, U, L, D}, 
and consider again a robot whose action space is given by U = 
{"Forward" , "Turn Right"}. The dynamics Q are similar 



to what was used in Examples |3.2| and 4.3 except that we add 
uncertainty to the transition of states that lie in the interior of 
the grid (see Fig. [5J. The probabilities for states on the edge 
of the grid are the same as before (see Fig|2]i. 
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Fig. 5: Graphical representation of some transitions in Q'. 

Consider the set of forbidden state F = Ux, y, 9) & S : 
{x,y) € {(2,2),(2,3),(3,2),(3,3),(8,8),(8,9),(9,8),(9,9)}}. 
We solve (|4|l-((6]l using the tool in J8). In Fig. [6] each state 
that belongs in §^ is shown in blue, where the darker the 



8 9 10 



With equation ( 10 1 in mind, note that additional convex 




Fig. 6: Depiction of §^ in blue. Darker blue indicates a higher 
value for fg. 



8 9 10 




Fig. 7: Depiction of Sp in blue with additional constraint 1191. 
Darker blue indicates a higher value for fg*. 



blue, the higher the value of f s . Note that the distribution is 
relatively uniform. 

Consider now D = {(x,y) € X x Y : 3 < x, y < 8}, and 
let a — 0.75. We solve |4]i-(|6]l and The result can be seen 
in Fig. 

B. Limiting Behavior with Multiple Recurrent Classes 

Consider again f su and JC* as before, and, without loss 
of generality, let §5, „ have two recurrent classes with initial 



states s 1 and s 2 

initial state so in S^i £. w 
holds with probability one: 



(equiv., 



s ,,s 



s /c*,fJ 



,K',¥) 



fM 



where (3 — 



f* s (s) (equiv. /3 = E 



For any 
the following 

(10) 

/!(*))• 



constraints may also be used to influence the limiting behavior 
of the robots. Moreover, by carefully selecting the number of 
robots allocated to each recurrent class, one can achieve a 
desirable limiting behavior for the ensemble of robots. 

VI. Conclusions 

We have proposed methods to design memoryless strate- 
gies for controlled Markov chains that guarantee maximal 
persistent surveillance properties under safety constraints. The 
uncomplicated structure of the resulting controllers makes 
them implementable in small robots. We have described a 
finitely parametrized convex program that solves this problem 
via entropy maximization principles, and we show that the 
computed control policy results in the closed loop Markov 
chain with the least number of recurrent classes. 
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